Proactive error prediction to improve storage system reliability

نویسندگان

  • Farzaneh Mahdisoltani
  • Ioan A. Stefanovici
  • Bianca Schroeder
چکیده

This paper proposes the use of machine learning techniques to make storage systems more reliable in the face of sector errors. Sector errors are partial drive failures, where individual sectors on a drive become unavailable, and occur at a high rate in both hard disk drives and solid state drives. The data in the affected sectors can only be recovered through redundancy in the system (e.g. another drive in the same RAID) and is lost if the error is encountered while the system operates in degraded mode, e.g. during RAID reconstruction. In this paper, we explore a range of different machine learning techniques and show that sector errors can be predicted ahead of time with high accuracy. Prediction is robust, even when only little training data or only training data for a different drive model is available. We also discuss a number of possible use cases for improving storage system reliability through the use of sector error predictors. We evaluate one such use case in detail: We show that the mean time to detecting errors (and hence the window of vulnerability to data loss) can be greatly reduced by adapting the speed of a scrubber based on error predictions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Storage System Reliability with Proactive Error Prediction

This paper proposes the use of machine learning techniques to make storage systems more reliable in the face of sector errors. Sector errors are partial drive failures, where individual sectors on a drive become unavailable, and occur at a high rate in both hard disk drives and solid state drives. The data in the affected sectors can only be recovered through redundancy in the system (e.g. anot...

متن کامل

Analysis of Probabilistic Error Checking Procedures on Storage Systems

Conventionally, error checking on storage systems is performed on-the-fly (with probability 1) as the storage system is being accessed in order to improve the reliability of the storage system. However, such a procedure may needlessly cause degraded performance due to the extra processing time needed for executing the error checking code. In this paper, we consider fault-tolerant storage system...

متن کامل

A Proactive Fault Tolerance Scheme for Large Scale Storage Systems

Facing increasingly high failure rate of drives in data centers, reactive fault tolerance mechanisms alone can hardly guarantee high reliability. Therefore, some hard drive failure prediction models that can predict soon-to-fail drives in advance have been raised. But few researchers applied these models to distributed systems to improve the reliability. This paper proposes SSM (Self-Scheduling...

متن کامل

Architecting Dependable Systems with Proactive Fault Management

Management of an ever-growing complexity of computing systems is an everlasting challenge for computer system engineers. We argue that we need to resort to predictive technologies in order to harness the system’s complexity and transform a vision of proactive system and failure management into reality. We describe proactive fault management, provide an overview and taxonomy for online failure p...

متن کامل

Prediction of fireball consequences caused by Boilover occurrence in the atmospheric storage tanks

Background and Objectives: Although Boilover occurs with a low frequency, but in case of occurrence, it can cause severe damage to people and equipment around the tank. The prediction of the fireball of Boilover phenomenon has an important role to play in adopting appropriate strategies for fire suppression of the atmospheric storage tank. The purpose of this study is to predict the consequence...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017